Bash Command Line

Basics

ls -F show files or directories

ls -F -a == ls -Fa show all files and directories (including hidden ones) (there is . before the name)

cd —> to home directory (user)

cd / —-> to the top unix directory (root)

cd - —> back to previous directory

man —-> manual page

  • q quit
  • b previous page
  • space next page

ls —help

cd c —-> give all the possible expansion starts with ‘c’

Creating things

mkdir

nano —- create and edit txt file

cat ——- show file content

rm -r —- recursively remove directories

rmdir —- remove directories (cannot remove when there are files in the directories)

ls -l —- file attributes

mv

  • mv chapter1/draft.txt chapter1/backup.txt ./ (move files)
  • rename files

cp

Compress and extract files

tar cvf data.tar data-shell/

  • cvf —- c creating/archive v visualize f put all files in data-shell/ into data.tar (using the following tar archive for the operation)
  • xvf — x unpack
  • cvfz —- archive and using gzip to compress
    • tar cvfz data.tar.gz data-shell/
  • xvfz — unzip the gz and unarchive the tar

gzip

  • more efficient compression to ‘.gz’

unzip

  • Unzip the zip file

cltrl + L —- clean the screen

history —- show history commands

tar is by far the most widely used archiving tool on UNIX-like systems. Since it was originally designed for sequential write/read on magnetic tapes, it does not index data for random access to its contents. A number of 3rd-party tools can add indexing to tar. However, there is a modern version of tar called DAR (stands for Disk ARchiver) that has some nice features:

  • each DAR archive includes an index for fast file list/restore,
  • DAR supports full / differential / incremental backup,
  • DAR has build-in compression on a file-by-file basis to make it more resilient against data corruption and to avoid compressing already compressed files such as video,
  • DAR supports strong encryption,
  • DAR can detect corruption in both headers and saved data and recover with minimal data loss,

and so on. Learning DAR is not part of this course. In the future, if you want to know more about working with DAR, please watch our DAR webinar (scroll down to see it).

File Transfer

securely transfer file between remote system and local system

scp : need RSA key

sftp

scp is useful, but what if we don’t know the exact location of what we want to transfer? Or perhaps we’re simply not sure which files we want to transfer yet. sftp is an interactive way of downloading and uploading files. Let’s connect to a cluster with sftp:

1
[local]$ sftp userXXX@cassiopeia.c3.ca

This will start what appears to be a shell with the prompt sftp>. However, we only have access to a limited number of commands. We can see which commands are available with help:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
sftp> help
Available commands:
bye Quit sftp
cd path Change remote directory to 'path'
chgrp grp path Change group of file 'path' to 'grp'
chmod mode path Change permissions of file 'path' to 'mode'
chown own path Change owner of file 'path' to 'own'
df [-hi] [path] Display statistics for current directory or
filesystem containing 'path'
exit Quit sftp
get [-afPpRr] remote [local] Download file
reget [-fPpRr] remote [local] Resume download file
reput [-fPpRr] [local] remote Resume upload file
help Display this help text
lcd path Change local directory to 'path'
lls [ls-options [path]] Display local directory listing
lmkdir path Create local directory
ln [-s] oldpath newpath Link remote file (-s for symlink)
lpwd Print local working directory
ls [-1afhlnrSt] [path] Display remote directory listing
...

Notice the presence of multiple commands that make mention of local and remote. We are actually browsing two filesystems at once, with two working directories!

1
2
3
4
5
6
7
8
sftp> pwd    # show our remote working directory
sftp> lpwd # show our local working directory
sftp> ls # show the contents of our remote directory
sftp> lls # show the contents of our local directory
sftp> cd # change the remote directory
sftp> lcd # change the local directory
sftp> put localFile # upload a file
sftp> get remoteFile # download a file

And we can recursively put/get files by just adding -r. Note that the directory needs to be present beforehand:

1
2
sftp> mkdir content
sftp> put -r content/

To quit, type exit or bye.

Exercise: Using one of the above methods, try transferring files to and from the cluster. For example, you can download bfiles.tar.gz to your laptop. Which method do you like best?

Note on Windows:

  • When you transfer files to from a Windows system to a Unix system (Mac, Linux, BSD, Solaris, etc.) this can cause problems. Windows encodes its files slightly different than Unix, and adds an extra character to every line.
  • On a Unix system, every line in a file ends with a \n (newline). On Windows, every line in a file ends with a \r\n (carriage return + newline). This causes problems sometimes.
  • You can identify if a file has Windows line endings with cat -A filename. A file with Windows line endings will have ^M$ at the end of every line. A file with Unix line endings will have $ at the end of a line.
  • Though most modern programming languages and software handles this correctly, in some rare instances, you may run into an issue. The solution is to convert a file from Windows to Unix encoding with the dos2unix filename command. Conversely, to convert back to Windows format, you can run unix2dos filename.

Note on syncing: there also a command rsync for synching two directories. It is super useful, especially for work in progress. For example, you can use it the download all the latest PNG images from your working directory on the cluster.

Tapping the power of Unix

Wildcards, redirection to files and pipes

ls p*

ls *th*

wc —word count

  • wc ethan.pdb
    • 12 84 622 ethane.pdb
    • #of lines # of words # of characters
    • also 622 bytes
  • wc -l *.pdb
  • wc -l *.pdb > list.txt
    • write the output into the file (standard output redirection to a file)

sort -n list.txt > sort.txt

  • -n numerically

head -3 sort.txt

  • print first 3 lines of the file

tail -3 sort.txt

  • print last 3 lines of the file

Constructing complex commands with Unix pipes

For example,

wc -l *.pdb > list.txt

sort -n list.txt > sort.txt

head -1 sort.txt

construct these three lines into a single command ?

wc -l *.pdb | sort -n | head -1

Aliases

Aliases are one-line shortcuts/abbreviation to avoid typing a longer command, e.g.

1
2
3
4
5
6
7
$ alias ls='ls -aFh'
$ alias pwd='pwd -P'
$ alias hi='history'
$ alias top='top -o cpu -s 10 -stats "pid,command,cpu,mem,threads,state,user"'
$ alias cedar='ssh -Y cedar.computecanada.ca'
$ alias weather='curl wttr.in/vancouver'
$ alias cal='gcal --starting-day=1' # starts on Monday

Now, instead of typing ssh -Y cedar.computecanada.ca, you can simply type cedar. To see all your defined aliases, type alias. To remove, e.g. the alias cedar, type unalias cedar.

You may want to put all your alias definitions into the file ~/.bashrc which is run every time you start a new local or remote shell.

Bash Loops

echo —- print whatever behind this command

To print the value of a variable, we need echo $variable

A easy example of for loop

1
2
3
4
for file in *.dat
do
echo $file
done

or we could write this easy example in one line by using semicolon.

1
for file in *.dat; do echo $file; done

A collection is required behind in.

We could create collections in several examples:

1
2
3
echo {1..10}
echo {1,2,5}
echo {a..c}

Above two commands will create and output two collections as

1 2 3 4 5 6 7 8 9 10

1 2 5

a b c

Note the collection is not a string.

substrings

${variable:0:3} —— the first 3 characters of string variable

Example: substract characters from strings

Exercise1

writing info into .bashrc is better.

.bashrc_profile

> v.s. >>

  • > will overwrite the contents of the file
  • >> will contancate the content to the end of the file

diff — compare files and folders

touch

​ touch a{1..100}.txt —— create 100 empty files named as a*.txt

 echo {a..z}{1..2} ---- a1 a2 b1 b2 .....

​ echo a{1..3}.{txt,py}

ps show all the process

​ ps aux —- show all users’

kill

​ kill PID

​ kill -9 PID ——> strongest killer

uniq —-

rsync

rsync -Pva —inplace user120@cassiopeia.c3.ca:thesis/ .

Using Unix pipes, write a one-line command to show the name of the longest *.pdb file (by the number of lines). Paste your answer here.

wc -l *pdb | sort -n | tail -2 | head -1

PS1=”\u@\h \w> “ —- changing the prompt

this variable is just in this shell

[user144@login1 ~]$ echo tmp/data-shell/molecules/*
tmp/data-shell/molecules/a.txt tmp/data-shell/molecules/cubane.pdb tmp/data-shell/molecules/ethane.pdb tmp/data-shell/molecules/list.txt tmp/data-shell/molecules/methane.pdb tmp/data-shell/molecules/octane.pdb tmp/data-shell/molecules/pentane.pdb tmp/data-shell/molecules/propane.pdb tmp/data-shell/molecules/sort.txt

for i in hello 1 2 * bye; do echo $i; done

This command will print the hello 1 2 and all files and directories in current directory and bye

1
2
3
4
var="sun"
echo $varshine
echo ${var}shine
echo "$var"shine

Redirection

Default to the terminal

mkdirr tmp 2> error.txt

The error will go into the file

mkdirr tmp 1>error.txt

only to terminal

mkdirr tmp &> error.txt

both to file and terminal

/dev/null is a ‘blackhole’

mkdir tmp ; cd tmp

run second command no matter the results of first command

mkdirr tmp && cd tmp

only run second command when first command is successful

1
2
3
4
5
6
7
myvar="hello"
echo $myvar
echo ${myvar:offset}
echo ${myvar:offset:length}
echo ${myvar:2:3} # 3 characters starting from character 2
echo ${myvar/l/L} # replace the first match of a pattern
echo ${myvar//l/L} # replace all matches of a pattern

/l/L replace first l with L

//l/L replace all l with L

1
2
3
4
5
6
touch hello "hello there" "hi there" "good morning, everyone"
ls *\ * # List all the file with space in the filename
for file in *\ *
do
echo mv "$file" ${file// /_} # we need to include the file name in "" because the name is long with space
done

Question 20

1
Write a loop that concatenates all .pdb files in data-shell/molecules subdirectory into one file called allmolecules.txt, prepending each fragment with the name of the corresponding .pdb file, and separating different files with an empty line. Run the loop, make sure it works, bring it up with the "up arrow" key and paste in here.
1
2
3
4
5
6
for file in *.pdb
do
echo ----- $file >> allmolecules.txt
cat $file >> allmolecules.txt
echo >> allmolecules.txt
done
1
2
3
4
5
TOPICCreate a loop that writes into 10 files chapter01.md, chapter02.md, ..., chapter10.md. Each file should contain chapter-specific lines, e.g. chapter05.md will contain exactly these lines:
## Chapter 05
This is the beginning of Chapter 05.
Content will go here.
This is the end of Chapter 05.
1
2
3
4
5
6
7
for i in {01..10} 
do
echo \#\# Chapter i > chapter"$i".md
echo This is the beginning of Chapter "$i" >> chapter"$i".md
echo Content will go here. >> chapter"$i".md
echo This is the end of Chapter "$i". >> chapter"$i".md
done

Its safe to put the expressions in quotes.

Scripts and functions

Shell scripts

.sh

1
#!/bin/bash

The shebang tells where to find interpreters.

Run:

1) bash process.sh

2) change it to executable

attributes

rwx rwx rwx : the first set refers to ==the owner of the file (i.e., the user)==; the second set refers to ==the group that owns the file==; the third set refers to ==everybody else on the system==.

1
chmod u+x process.sh

Add executable permission of the user to this file.

then we run

1
./process.sh

Note: You are unable to run the file through process.sh, because this command is not the PATH

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# this is shebang, which tells the compiler where to find interpreters.
#!/bin/bash

#echo hello world!

# return the first second and third argument passed to the script.
#echo $1 $2 $3

# return all arguments passed to it
#echo the arguments are $@


for word in $@
do
echo $word
done

When you run the above code, give the input as

1
./process.sh A B C D FUCK

Example

1
2
3
4
5
6
7
for molecule in $@
do
echo $molecule
head -3 $molecule
wc -l $molecule
echo -----
done

you could run:

1
2
3
./process.sh *.pdb
# or
./process.sh ethane.pdb cubane.pdb

Variables

1
2
myvar=3
export myvar1=5

When using export, the variable could be inherited by the scripts (say creating a new bash file. The variable could be accessed inside this shell.). However, without this, the variable is only available outside the script.

printenv or env will print all the environment variables.

To reset a variable, use

1
unset myvar1

$HOME variable — home directory

$PATH variable, where shell will look for interpreters.

$PWD variable —- stores current directory

$PS1 variable —— stores the format of the prompt

For example, if the $PS1 is [\u@\h \W]\$ , the prompt looks like this image-20200527133916746

Using which ls could find out where ‘ls’ file locates.

Functions

Functions are similar to scripts, except that we reference a function by its name. Therefore, once defined, a function can be run in any directory, whereas running a script in another directory requires its path.

A convenient place to put all your function definitions is ~/.bashrc file which is run every time you start a new shell (local or remote).

1
2
3
4
5
6
7
8
9
# define a function 
greeting(){
echo hello!
}
# see the function
type greeting

# run the function
greeting

$@ show the values of arguments

$# show the number of arguments

we could write the function in a .sh file

1
2
3
4
5
6
7
function greeting(){


}
function combine(){

}

after that we need to ==source function.sh==to load the definition into the shell. Every time you change the definition of the function, you have to execute source again.

A complicated example of Function

$RANDOM will generate random integer.

1
2
3
4
5
6
7
8
9
10
11
function combine(){
if [ $# -eq 0]; then
echo No arguments specified. Usage: combine file1 [file2 ...]
return 1
fi
dir=$RANDOM$RANDOM
mkdir $dir
mv $@ $dir
echo moved these files into $dir successfully.

}

.bashrc: Could put the function definitions into this file. Then when the shell is started, the .bashrc will be loaded. So you dont need to source your function everytime

Grep and find

Searching inside files with grep

1
2
3
4
5
# partly matching
grep day haiku.txt

#exactly matchig
grep -w day haiku.txt
  • grep is case-sensitive
  • -i —-> set grep to case insensitive
  • -n —-> return line number
  • -v —-> print all the lines that DOESN’T match
  • more flags: see man grep

Finding files with find

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#search the file with name of 'haiku.txt' within current directory
find . -name haiku.txt

# search the file with name of 'haiku.txt' within current directory and its child directories
find . -name "*.txt"


# This works the same as the first command
# This because The bash will expand the pattern of *.txt'; then do the find
find . -name *.txt

#find all with the type of file
find . -type f

#find all with the type of directory
find . -type d

# setting searching depth.
find . -maxdepth 1 -type d

Combing find and grep

1
2
3
4
5
6
7
8
9
10
11
# aggregate the result 
$(find . -name "*.txt")

for file in $(find . -name "*.txt")
do
grep day $file
done

# Another way

find . -name "*.txt" | xargs grep day

==xargs==

The xargs command in UNIX is a command line utility for building an execution pipeline from standard input. Whilst tools like grep can accept standard input as a parameter, many other tools cannot. Using xargs allows tools like echo and rm and mkdir to accept standard input as arguments.

For more, refer to https://shapeshed.com/unix-xargs/

Text Manipulation

Text manipulation

Goals: learn sed tr —- tr —- stands for translate /// translate or delete characters

sed: stream editor for filtering and transforming text

GNU version of sed and BSD version of sed have different options and arguments.

Using Address Ranges: Addresses let you target specific parts of a text stream. You can specify line or even a range of lines.

1
2
3
4
# Using sed to convert all 'invisible' to 'supervisible'

sed 's/[Ii]nvisible/supervisible/g' wellinvisibleMan.txt > supervisible.txt
# 's' specifies the substitution operation ; By default, sed all only convert first match of each line; hence, we need to add 'g'

For sed command

s/regexp/replacement : Attempt to match regexp (could apply regular expression) against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 throughout \9 to refer to the corresponding matching sub-expressions in the regexp

==The character after the s is the delimiter. Pick one you like. As long as it’s not in the string you are looking for, anything goes.== And remember that you need three delimiters. If you get a “Unterminated `s’ command” it’s because you are missing one of them.

For example:

1
2
3
sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new

sed 's_/usr/local/bin_/common/bin_' <old >new

==`` is the same as \$( ). The command will execute within brackets and then the results will be returned. It’s better to use \$( ).==

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Delete all punctuation marks

cat well.txt | tr -d "[:punct:]" > a1.txt

# Note There are some expressions specifically for tr command
# For example: [:punct:] --- all punctuation characters
# [:digit:]---- all digits
# Refer to man tr

# uppercase to lowercase

cat a1.txt | tr '[:upper:]' '[:lower:]' > a2.txt

# replace space with \n

cat a2.txt | sed 's/ /\'$'\n/g' > a3.txt
# I think we could just use
cat a2.txt | sed 's/ /\n/g' > a3.txt

# \'$'\n short for new line

# remove empty lines
sed '/^$/d' a3.txt > a4.txt

#
cat a4.txt | sort | uniq -c > a5.txt

# return unique words and their counts

cat a5.txt | sort -gr > a6.txt
# -g --->compare according to general numerical value
# -r ---> reverse the result of comparisons(from max to min)

More refer to:

1) https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux

2) https://www.digitalocean.com/community/tutorials/using-grep-regular-expressions-to-search-for-text-patterns-in-linux

3) https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/

4) https://www.geeksforgeeks.org/sed-command-linux-set-2/

Column-based Text processing with awk scripting language

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
awk '{print $1}' haiku.txt

# $1 --the first field, that is first word of each line in this case.
# $0 --full line

awk '{print}' haiku.txt # also print all

# Specify other separators between words (default is space)

awk -Fa '{print $1}' haiku.txt # set 'a' as the seperator


awk 'NR>1' haiku.txt # print the number of record greater than 1,i.e, all lines except the first line

awk 'NR>2' haiku.txt

awk 'NR>2 && NR<5' haiku.txt # print line 3 and 4

Fuzzy finder

Fuzzy finder fzf is a third-party tool, not installed by default. With basic usage, it does interactive processing of standard input. At a more advanced level (not covered in the video below), it provides key bindings and fuzzy completion.

Fuzzy finder is an interactive finding tools. We could also pipe the return results into other commands.

1
2
3
4
5
6
7
8
9
10
11
12
#Example 1

nano $(find ./molecules -type f | fzf --height 40%)
# -- height 40% is used to set height of the fzf windows as 40% of the full screen
# Once you choose the file, it will be opened in nano

#Example 2
kill -9 $(ps aux | fzf | awk '{print $2}')
# ps -- list all the proecess, find the process you want to kill, awk will take only the ID, then kill will kill this process

# Example 3
emacs $(find ~/Documents/ -type f | fzf)

Excerise -Day2

ls -l $(which gcc)

grep ATOM $(find . -name “*.pdb”)

Question 8 : Write a function archive() to replace directories with their gzipped archives.

1
2
3
4
5
6
7
function archive(){
for dir in $@
do
tar cvf ${dir/\/}.tar.gz $dir/ && /bin/rm -r $dir
# The command after && will be executed only after the previous command is successfully executed
done
}

Question 9

Write a one-line command that finds 5 largest files in the current directory and prints only their names and file sizes in the human-readable format (indicating bytes, kB, MB, GB, …) in the decreasing file-size order. Hint: use find, xargs, and awk.

1
2
3
find . -type f| xargs ls -lSh | awk '{print 5 "   " 9}' | head -5

# For ls command, -h ---> humman-readable sizes (K, M, G...) -S -->sort by file size, largest first

Appendix 1 Regular Expressions

Regular expressions could be used in almost every computer language.

Basic Regular expressions

Symbol Descriptions
==^== matches start of string
==$== matches end of string
. replaces any character
==\\== represent special characters
() groups regular expressions
* matches up zero or more times the preceding character
1
2
3
4
5
6
cat sample | grep ^a # search content that starts with 'a'
cat sample | grep a$ # search content that ends with'a'
grep "d.g" file1 # Search for words that start with 'd', end with 'g', and have any character in the middel. If there are six dots, that means six characters in the middle.
grep "N[oen]n" file1 # 'o' or 'e' or 'n'; [a-e] a range of chracters
[^1-9] #==> all characters except number 1 to 9.
grep "lak*" file1 # match any number of occurrences of the preceding letter, including none

Interval Regular Expressions

Expression Descriptions
{n} Matches the preceding character appearing n times exactly
{n,m} Matches the preceding character appearing n times but no more than m times
{n,} Matches the preceding character only when it appears n times or more
1
2
cat sample | grep -E p\{2} # -E --> extended regular expressions
# find the content with 'p' appearing exactly 2 times in a string one after the oter.(two consecutive 'p')

Extended Regular Expressions

Expression Description
\+ Matches one or more occurrence of the previous character
\\? Matches zero or one occurrence of the previous character
1
2
3
4
5
cat sample | grep "a\+t"
# Or
cat sample | grep -E a\+t

grep "ba\?b" file1

Brace (大括号) expansion

The syntax for brace expansion is either a sequence or a comma separated list of items inside curly braces “{}”.

1
2
{aa,bb,cc,dd}  # ==> aa bb cc dd
{1..9} # ==> 1 2 3 4 5 6 7 8 9

Shorthand Characters

Character Description
\s match whitespaces (a space, a tab or line break)
\d match digits == [0-9]
\w match all the word characters (A-Z a-z) AND _
\S opposite of \s
\D opposite of \d
\W opposite of \w

Word Boundaries

Character Description
\\< used for beginning of the ==word==
\> used for end of the ==word==
\\b used for either beginning or end of the ==word==, could replace \\< or \>
1
2
3
4
5
grep "e\>" sample #locate words with 'e' at the end
grep "\<a" sample # locate word with 'a' in the beggining
grep "t\b" sample # locate word with 't' at the end
grep "\bt\|t\b" sample # locate words with t in the beginning or at the end
# Because grep use basic regular expression, we need to escaple | .Or use -E or -P

Anchor

^ used to beginning of the ==line==
$ used for end of the ==line==

References

  1. https://www.guru99.com/linux-regular-expressions.html

Q&A

1. Single quote, double quote, brackets and backstick

  • ==`…` is the same as \$(…). The command will execute within brackets and then the results will be returned. It’s better to use \$( ).==

  • single quote will not interpolate anything, but double quotes will. (like variables, backticks, certain escapes). When you enclose characters or variable with single quote then it represents the literal value of the character. Besides, a single quote can’t be used within another single quote.

    Example:

    1
    2
    3
    4
    5
    num=3
    echo '$num'
    >> $num
    echo "$num"
    >> 3

    如果想让“ ” 里面输出$,需要加 \

    • If you use any space between the string values then they will be treated as separate value and print separately.

      1
      2
      3
      4
      5
      printf '%s\n' "Ubuntu""Centos"
      >> UbuntuCentos
      printf '%s\n' "Ubuntu" "Centos"
      >> Ubuntu
      Centos

    References:

    https://linuxhint.com/bash_escape_quotes/#:~:text=The%20dollar%20sign%20(%20%24%20)%20and,backticks%2C%20double%20quote%20and%20backslash. and,backticks%2C double quote and backslash.)